Estimating Semantic Distance Using Soft Semantic Constraints in Knowledge-Source - Corpus Hybrid Models
نویسندگان
چکیده
Strictly corpus-based measures of semantic distance conflate co-occurrence information pertaining to the many possible senses of target words. We propose a corpus–thesaurus hybrid method that uses soft constraints to generate word-senseaware distributional profiles (DPs) from coarser “concept DPs” (derived from a Roget-like thesaurus) and sense-unaware traditional word DPs (derived from raw text). Although it uses a knowledge source, the method is not vocabularylimited: if the target word is not in the thesaurus, the method falls back gracefully on the word’s co-occurrence information. This allows the method to access valuable information encoded in a lexical resource, such as a thesaurus, while still being able to effectively handle domainspecific terms and named entities. Experiments on word-pair ranking by semantic distance show the new hybrid method to be superior to others.
منابع مشابه
A Maximum Entropy Approach for Semantic Language Modeling
The conventional n-gram language model exploits only the immediate context of historical words without exploring long-distance semantic information. In this paper, we present a new information source extracted from latent semantic analysis (LSA) and adopt the maximum entropy (ME) principle to integrate it into an n-gram language model. With the ME approach, each information source serves as a s...
متن کاملEnhancing Document Clustering Using Hybrid Models for Semantic Similarity
Different document representation models have been proposed to measure semantic similarity between documents using corpus statistics. Some of these models explicitly estimate semantic similarity based on measures of correlations between terms, while others apply dimension reduction techniques to obtain latent representation of concepts. This paper proposes new hybrid models that combine explici...
متن کاملCross-Lingual Distributional Profiles of Concepts for Measuring Semantic Distance
We present the idea of estimating semantic distance in one, possibly resource-poor, language using a knowledge source in another, possibly resource-rich, language. We do so by creating cross-lingual distributional profiles of concepts, using a bilingual lexicon and a bootstrapping algorithm, but without the use of any sense-annotated data or word-aligned corpora. The cross-lingual measures of s...
متن کاملFine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models
Title of dissertation: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models Yuval Marton, Doctor of Philosophy, 2009 Dissertation directed by: Professor Philip Resnik, Department of Linguistics and Institute for Advanced Computer Studies This dissertation focuses on effective combination of data-driven natural language processing (NLP) approaches with lingu...
متن کاملSemantic Prosody: Its Knowledge and Appropriate Selection of Equivalents
In translation, choosing appropriate equivalent is essential to convey the right message from source-text to target-text, and one of the issues that may have a determinative role in appropriate equivalent choice is the semantic prosody (SP) behavior of words and the relation existing between the SP of a word and semantic senses (i.e. negativity, positivity or neutrality) of its collocations in ...
متن کامل